Axolotl: a Web Accessible Parallel Corpus for Spanish-Nahuatl

نویسندگان

  • Ximena Gutierrez-Vasques
  • Gerardo Sierra
  • Isaac Hernandez Pompa
چکیده

This paper describes the project called Axolotl which comprises a Spanish-Nahuatl parallel corpus and its search interface. Spanish and Nahuatl are distant languages spoken in the same country. Due to the scarcity of digital resources, we describe the several problems that arose when compiling this corpus: most of our sources were non-digital books, we faced errors when digitizing the sources and there were difficulties in the sentence alignment process, just to mention some. The documents of the parallel corpus are not homogeneous, they were extracted from different sources, there is dialectal, diachronical, and orthographical variation. Additionally, we present a web search interface that allows to make queries through the whole parallel corpus, the system is capable to retrieve the parallel fragments that contain a word or phrase searched by a user in any of the languages. To our knowledge, this is the first Spanish-Nahuatl public available digital parallel corpus. We think that this resource can be useful to develop language technologies and linguistic studies for this language pair.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bilingual lexicon extraction for a distant language pair using a small parallel corpus

The aim of this thesis proposal is to perform bilingual lexicon extraction for cases in which small parallel corpora are available and it is not easy to obtain monolingual corpus for at least one of the languages. Moreover, the languages are typologically distant and there is no bilingual seed lexicon available. We focus on the language pair Spanish-Nahuatl, we propose to work with morpheme bas...

متن کامل

Toward the Elaboration of a Spanish-Chinese Parallel Annotated Corpus

Spanish and Chinese are two very different languages in all language levels. Therefore, translation (both human and machine translation) from one to another and learning one of them as a foreign language are challenging tasks. Some automatic translation systems exist for this pair of languages, but there is enough room to improve the translation quality between Spanish and Chinese. In addition,...

متن کامل

Cultural Influence on the Expression of Cathartic Conceptualization in English and Spanish: A Corpus-Based Analysis

This paper investigates the conceptualization of emotional release from a cognitive linguistics perspective (Cognitive Metaphor Theory). The metaphor weeping is a means of liberating contained emotions is grounded in universal embodied cognition and is reflected in linguistic expressions in English and Spanish. Lexicalization patterns which encapsulate this conceptualization i...

متن کامل

Computer Systems for Analysis of Nahuatl

This article describes two computer systems that allow us to analyze words written in the Nahuatl language. The main goal is the diffusion and preservation of indigenous language with great historical, linguistic, literary and nationalistic relevance by developing language resources for Nahuatl. One system automatically gets prefixes or suffixes of words from a text written in Nahuatl. This sys...

متن کامل

Learning Spanish-Galician Translation Equivalents Using a Comparable Corpus and a Bilingual Dictionary

So far, research on extraction of translation equivalents from comparable, non-parallel corpora has not been very popular. The main reason was the poor results when compared to those obtained from aligned parallel corpora. The method proposed in this paper, relying on seed patterns generated from external bilingual dictionaries, allows us to achieve similar results to those from parallel corpus...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016